Density-Based Clustering Based on Probability Distribution for Uncertain Data
نویسندگان
چکیده
Today we have seen so much digital uncertain data produced. Handling of this uncertain data is very difficult. Commonly, the distance between these uncertain object descriptions are expressed by one numerical distance value. Clustering on uncertain data is one of the essential and challenging tasks in mining uncertain data. The previous methods extend partitioning clustering methods like k-means and density-based clustering methods like DBSCAN on uncertain data based on geometric distances between objects. Such method facing the problems with the data that they cannot handle uncertain objects that are geometrically indistinguishable ( such as weather data across the world at same time). In this paper, we model uncertain objects in both continuous and discrete domains with the help of probability distribution. We use Kullback-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete Values, and integrate that into partitioning and density-based clustering methods to cluster uncertain objects. We first find out uncertain objects and then we cluster uncertain data according to partitioning based clustering. Then remaining data we clustered by using any traditional method of clustering.
منابع مشابه
Technique For Clustering Uncertain Data Based On Probability Distribution Similarity
: Clustering on uncertain data, one of the essential tasks in data mining. The traditional algorithms like K-Means clustering, UK Means clustering, density based clustering etc, to cluster uncertain data are limited to using geometric distance based similarity measures and cannot capture the difference between uncertain data with their distributions. Such methods cannot handle uncertain objects...
متن کاملClustering Multi-Attribute Uncertain Data using Probability Distribution
Clustering is an unsupervised classification technique for grouping set of abstract objects into classes of similar objects. Clustering uncertain data is one of the essential tasks in mining uncertain data. Uncertain data is typically found in the area of sensor networks, weather data, customer rating data etc. The earlier methods for clustering uncertain data based on probability distribution,...
متن کاملClustering on Uncertain Data using Kullback Leibler Divergence Measurement based on Probability Distribution
Cluster analysis is one of the important data analysis methods and is a very complex task. It is the art of a detecting group of similar objects in large data sets without requiring specified groups by means of explicit features or knowledge of data. Clustering on uncertain data is a most difficult task in both modeling similarity between uncertain data objects and developing efficient computat...
متن کاملProbability Density Grid-based Online Clustering for Uncertain Data Streams
Most existing stream clustering algorithms adopt the online component and offline component. The disadvantage of two-phase algorithms is that they can not generate the final clusters online and the accurate clustering results need to be got through the offline analysis. Furthermore, the clustering algorithms for uncertain data streams are incompetent to find clusters of arbitrary shapes accordi...
متن کاملImplementation of clustering of uncertain data on probability distribution similarity
Clustering on uncertain data, one of the essential tasks in mining uncertain data, posts significant challenges on both modeling similarity between uncertain objects and developing efficient computational methods. The previous methods extend traditional partitioning clustering methods like k-means and density-based clustering methods like DBSCAN to uncertain data, thus rely on geometric distanc...
متن کامل